Picture for Shuangyong Song

Shuangyong Song

Prompt-Level Reward Specifications for Open-Ended Post-Training

Add code
May 28, 2026
Viaarxiv icon

Pairwise Preference Reward and Group-Based Diversity Enhancement for Superior Open-Ended Generation

Add code
May 18, 2026
Viaarxiv icon

Awakening Dormant Experts:Counterfactual Routing to Mitigate MoE Hallucinations

Add code
Apr 15, 2026
Viaarxiv icon

Dynamic Knowledge Fusion for Multi-Domain Dialogue State Tracking

Add code
Mar 11, 2026
Viaarxiv icon

UniARM: Towards a Unified Autoregressive Reward Model for Multi-Objective Test-Time Alignment

Add code
Feb 10, 2026
Viaarxiv icon

Stop Rewarding Hallucinated Steps: Faithfulness-Aware Step-Level Reinforcement Learning for Small Reasoning Models

Add code
Feb 05, 2026
Viaarxiv icon

ReasonTabQA: A Comprehensive Benchmark for Table Question Answering from Real World Industrial Scenarios

Add code
Jan 12, 2026
Viaarxiv icon

Training Report of TeleChat3-MoE

Add code
Dec 30, 2025
Viaarxiv icon

Multi-Intent Spoken Language Understanding: Methods, Trends, and Challenges

Add code
Dec 12, 2025
Viaarxiv icon

MR-UIE: Multi-Perspective Reasoning with Reinforcement Learning for Universal Information Extraction

Add code
Sep 11, 2025
Viaarxiv icon